Geo-wise Correlation Analysis at County Level
Read the smoothed search signals from Google-Symptoms: anosmia, ageusia, combined_symptoms
Correlations sliced by time Here we look at Spearman (rank) correlations between our signals and COVID-19 case incidence rates, sliced by time. That is, for each day, we compute the correlation between each signal and COVID-19 case incidence rates, over all counties (with at least 500 cumulative cases).
The google-symptoms signals obtain relatively high county-level geo-wise correlation compared with other signals. There is no big different among the 3 signals of google-symptoms. This shows that though only ~100 counties are available, the estimates are of high quality within those areas. There are two sudden drops in June and August which worth further exploration.
Now we look at Spearman (rank) correlations between our signals and COVID-19 case incidence rates, sliced by county. That is, for each county (with at least 500 cumulative cases), we compute the correlation between each signal and COVID-19 case incidence rates, over all time.
We can also look at choropleth maps to get a geographic sense of the correlation distribution for each signal.
We fetch various signals from our API, from April 15 through to the current day.
Here we look at Spearman (rank) correlations between our signals and COVID-19 case incidence rates, sliced by time. That is, for each day, we compute the correlation between each signal and COVID-19 case incidence rates, over all metro areas (with at least 500 cumulative cases).
The google-symptoms signals do not obtain very competitive geo-wise correlation at MSA level which is intuitive since there is only ~100 counties available for ~60 MSAs. Remember that we aggregate the county level search volume by population-weighted average. The counties that fail to meet the privacy or quality thresholds will have value 0 instead of NAN during the aggregation from county level to MSA level. Thus, our MSA level estimates are not the actual ones but having bias to some extent due to a large proportion of counties with missing values within certain MSAs.
Now we look at Spearman (rank) correlations between our signals and COVID-19 case incidence rates, sliced by metro area That is, for each metro area (with at least 500 cumulative cases), we compute the correlation between each signal and COVID-19 case incidence rates, over all time.